ML to help solve FOL model-checking problems tabula rasa

Main Finding so far:

Neural MCTS solves specific combinatorial problem instances through gamification (We tested HSR [The Joy of Egg-Dropping Sniedovich (2003)] and QSAT [Kleinberg/Tardos]).

Introduction to Zermelo Gamification

Authors: Karl Lieberherr and Ruiyang Xu

Our research is inspired by AlphaZero's superhuman capabilities to play Chess and Go using a combination of MCTS (Monte Carlo Tree Search) and neural networks.  We believe that the neural MCTS algorithm used in AlphaZero can be adapted to search effectively in other search spaces. We currently focus on using neural MCTS  to searching for solutions to first-order model-checking  problems (which include many combinatorial problems) and hope to get a surprising search performance similar to AlphaZero for Chess.  Our goal is to search for a solution tabula rasa, i.e., only using the mathematical problem definition. While any first-order model-checking problem can be translated into a game (hint: use the semantic game defined by the predicate logic sentence of the model-checking problem), the resulting games usually have action spaces that are too large for a direct application of neural MCTS.

The neural MCTS algorithm builds "self-teaching" neural nets that learn from asymmetric self-game-play. Go and Chess are symmetric games where a solution to play White optimally can be used to derive a solution to play Black optimally. However, the games which come from model-checking problems are not necessarily symmetric. The two players might have significantly different tasks. This affects self-game-play because the two players have tasks of different complexities. 

We have succeeded in properly translating certain combinatorial problems to inputs of the neural MCTS algorithm so that the algorithm converges to find the winning strategy for both players and correctly solves the combinatorial problem for a specific instance.  We prove this by evaluating an AlphaZero style algorithm in a ground truth setting. 

Although currently each problem instance is solved separately, we hope that we can get a more general solution that works on a family of related instances.

We call our framework starting from a combinatorial problem, creating a game, solving the game with neural MCTS and mapping the winning strategy back to a solution of the combinatorial problem, the Zermelo Gamification (ZG). It is a new framework for solving combinatorial problems leveraging the outstanding search capabilities of neural MCTS as demonstrated by Google’s AlphaZero. 

Gamification is a well studied topic when the players are human neural networks. We attempt a new kind of gamification where the players are artificial neural networks.  For human neural networks the focus is on motivating humans to contribute by making the game motivating. Gamification for human neural networks is used to train or educate humans in a specific domain (cite Magy Seif el-Nasr) or to use them as problem solvers (cite Seth Cooper, Fold-it). For artificial neural networks the focus is on creating a game and choosing neural networks  for learning and playing the game that are the perfect fit for solving the problem at hand.

Our objective is not necessarily to improve the state-of-the-art of hand-crafted problem solvers in specific areas but to illustrate that there is a generic algorithm (neural MCTS) that can solve well-known problems tabula-rasa. The hope is that neural MCTS  will help solve future algorithmic problems that have not yet been solved by humans. We view Neural MCTS as a helper in human solving of algorithmic problems in the future.


Note 1: Gamification is traditionally used to encourage participation by humans. We use gamification to make problems solvable by machines. This puts new constraints on how the gamification is done using the basic quantifier-based gamification from semantic games [Hintikka]. Zermelo Gamification is about how to gamify and how to design the neural MCTS architecture so that neural MCTS can succeed in solving an instance of the FOL model-checking problem.Note 2: We name the gamification after the logician Ernst Zermelo who is well known for Zermelo's Theorem for Chess and Zermelo-Fraenkel set theory.

Justification of the Zermelo Gamification (explanation capability of neural networks)

Why don’t we try to solve the combinatorial problem directly rather than by translating it into a game using a Zermelo Gamification? ML is often accused of producing black box solutions that don’t explain themselves.  We design the neural network with enough hooks so that we can  better observe  what the neural network does. The hooks are created  through the game transformation that trades a complex question (e.g., is there a solution)  for a series of simpler questions (e.g.,  am I losing when I make this choice?). There is a trend in ML to use neural networks with hooks to make them more understandable. See Hanspeter Pfister's work at Harvard.

With the constructive nature of the games we get through the semantic games approach we can explore why questions. For example, Why does this choice not lead to a solution? You can play the game, and you see an example gameplay that indeed does not lead to a solution. The advantage of gamification is that the gameplay demonstrates that there is a solution/ no solution to a specific instance. This constructive demonstration is useful to humans who try to understand the problem. The demonstration is much better than just being told: no, there is no solution. Of course, here we assume that neural MCTS finds the optimal strategy. 

This is a key benefit of ZG: the explanatory power of the neural networks is helping to better understand the problem. This has, for example, educational benefits. Our neural networks provide constructive answers both for a positive problem as well as a negative problem (which does not have a solution). For a negative problem, the neural network will provide a demonstration to anyone who tries to construct a solution that the solution attempt will result in a contradiction. This is much more useful than simply the answer no. The improved explanatory power does not provide proof but only a demonstration that the problem is unsolvable. This demonstration is given during an attempt to construct a solution and requires a tightrope walk of the neural networks to produce a contradiction.  There is danger that no contradiction is produced and this would make it appear that there is a solution. In our case, the neural networks don't just say the equivalent of this is a cat but they give a step-by-step explanation of why it is a cat. 

With the Zermelo Gamification, we generalize the initial construction problem to an initial/complement construction problem. The generalization to the initial/complement construction problem is more useful for problems whose games have many alternating moves. We observe an application of Polya’s inventor’s paradox which says that sometimes a more general question is more comfortable to solve. In our case, the only way that machine learning can succeed is if we attack the initial problem and the complement simultaneously. They need each other to find the solution and converge to the optimal policy. The line graphs of HSR games demonstrate that the Proponent and Opponent help each other to learn to get better over time but with occasional setbacks.

Simulatability (Murdoch et al. 2018).  This section is work in progress.

Requirements for the Zermelo Gamification

The Zermelo Gamification is more complicated than it looks. It requires a sophisticated compiler to map the combinatorial problem formulation into a game that neural-MCTS solves successfully. Here are the concerns that apply. For now, we humans have to balance those concerns.

One limitation of our current neural MCTS algorithm is that it has competitors  like the Alpha-Beta Pruning algorithm.  However, we believe that our standard hardware (1 desktop with 4 cores)  is the reason that Neural MCTS does not beat Alpha-Beta Pruning as it does for Chess and Go.

Evaluating Zermelo Gamification

We want to measure whether ZG is successful  and converges to a winning strategy. For Chess and Go we know that AlphaZero finds a good strategy but we don't know whether it is a winning one. Having a winning strategy means to only make correct moves. A move is correct for player P if it starts in a winning position for P and ends in a winning position for the same player. A move is also correct for player P if it starts in a lost position for P. According to this definition, when you are lost you cannot make a mistake unless your opponent makes a move which is not correct. Then you are in a winning position and you must make all correct moves if you want to stay mistake-free.

Making  only correct moves  in  a winning position is correlated with a high level of skill.


Our publications


What we build on

Gerald Tesauro: Programming backgammon using self-teaching neural nets  http://www.bkgm.com/articles/tesauro/ProgrammingBackgammon.pdf

AlphaGo

AlphaZero

Expert Iteration: Deep Pepper: Expert Iteration based Chess agent in the Reinforcement Learning Setting https://arxiv.org/pdf/1806.00683.pdf

More Related Work

Elias Khalil: https://papers.nips.cc/paper/7214-learning-combinatorial-optimization-algorithms-over-graphs Uses reinforcement learning and graph embedding.


http://www.cs.utexas.edu/~hunt/class/2015-fall/cs395t/slides/QBF.pdf (QSAT info)

How is overfitting and underfitting an issue with neuralMCTS?

https://machinelearningmastery.com/overfitting-and-underfitting-with-machine-learning-algorithms/ 

From AlphaGo to Power System AI by Fangxing Li and Yan Du, IEEE Power and Energy Magazine

https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8295075&tag=1 

Analysis of HSR:

Moshe Sniedovich, (2003) OR/MS Games: 4. The Joy of Egg-Dropping in Braunschweig and Hong Kong. INFORMS Transactions

on Education 4(1):48-64. http://dx.doi.org/10.1287/ited.4.1.48